INVESTIGATION Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium

نویسندگان

  • Po-Ru Loh
  • Mark Lipson
  • Nick Patterson
  • Priya Moorjani
  • Joseph K. Pickrell
  • David Reich
چکیده

Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We de ne an LDbased three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and tting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese. ADMIXTURE between previously diverged populations has been a common feature throughout the evolution of modern humans and has left signi cant genetic traces in contemporary populations (Li et al. 2008; Wall et al. 2009; Reich et al. 2009; Green et al. 2010; Gravel et al. 2011; Pugach et al. 2011; Patterson et al. 2012). Resulting patterns of variation can provide information about migrations, demographic histories, and natural selection and can also be a valuable tool for association mapping of disease genes in admixed populations (Patterson et al. 2004). Recently, a variety of methods have been developed to harness large-scale genotype data to infer admixture events in the history of sampled populations, as well as to estimate a range of gene ow parameters, including ages, proportions, and sources. Some of the most popular approaches, such as STRUCTURE (Pritchard et al. 2000) and principal component analysis (PCA) (Patterson et al. 2006), use clustering algorithms to identify admixed populations as intermediates in relation to surrogate ancestral populations. In a somewhat similar vein, local ancestry inference methods (Tang et al. 2006; Sankararaman et al. 2008; Price et al. 2009; Lawson et al. 2012) analyze chromosomes of admixed individuals with the goal of recovering continuous blocks inherited directly from each ancestral population. Because recombination breaks down ancestry tracts through successive generations, the time of admixture can be inferred from the tract length distribution (Pool and Nielsen 2009; Pugach et al. 2011; Gravel 2012), with the caveat that accurate local ancestry inference becomes dif cult when tracts are short or the reference populations used are highly diverged from the true mixing populations. A third class of methods makes use of allele frequency differentiation among populations to deduce the presence Copyright © 2013 by the Genetics Society of America doi: 10.1534/genetics.112.147330 Manuscript received October 31, 2012; accepted for publication January 25, 2013 Available freely online through the author-supported open access option. Supporting information is available online at http://www.genetics.org/lookup/suppl/ doi:10.1534/genetics.112.147330/-/DC1. These authors contributed equally to this work. Corresponding authors: Department of Genetics, Harvard Medical School, 77 Ave. Louis Pasteur, New Research Bldg., 260I, Boston, MA 02115. E-mail: reich@genetics. med.harvard.edu; and Department of Mathematics, 2-373, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139. E-mail: [email protected] Genetics, Vol. 193, 1233 1254 April 2013 1233 1233.pdf http://www.ncbi.nlm.nih.gov.ezp-prod1.hul.harvard.edu/pmc/art... 1 of 46 4/11/13 11:03 AM of admixture and estimate parameters, either with likelihoodbased models (Chikhi et al. 2001; Wang 2003; Sousa et al. 2009; Wall et al. 2009; Laval et al. 2010; Gravel et al. 2011) or with phylogenetic trees built by taking moments of the site-frequency spectrum over large sets of SNPs (Reich et al. 2009; Green et al. 2010; Patterson et al. 2012; Pickrell and Pritchard 2012; Lipson et al. 2012). For example, f-statisticbased threeand four-population tests for admixture (Reich et al. 2009; Green et al. 2010; Patterson et al. 2012) are highly sensitive in the proper parameter regimes and when the set of sampled populations suf ciently represents the phylogeny. One disadvantage of drift-based statistics, however, is that because the rate of genetic drift depends on population size, these methods do not allow for inference of the time that has elapsed since admixture events. Finally, Moorjani et al. (2011) recently proposed a fourth approach, using associations between pairs of loci to make inference about admixture, which we further develop in this article. In general, linkage disequilibrium (LD) in a population can be generated by selection, genetic drift, or population structure, and it is eroded by recombination. Within a homogeneous population, steady-state neutral LD is maintained by the balance of drift and recombination, typically becoming negligible in humans at distances of more than a few hundred kilobases (Reich et al. 2001; International HapMap Consortium 2007). Even if a population is currently well mixed, however, it can retain longer-range admixture LD (ALD) from admixture events in its history involving previously separated populations. ALD is caused by associations between nearby loci co-inherited on an intact chromosomal block from one of the ancestral mixing populations (Chakraborty and Weiss 1988). Recombination breaks down these associations, leaving a signature of the time elapsed since admixture that can be probed by aggregating pairwise LD measurements through an appropriate weighting scheme; the resulting weighted LD curve (as a function of genetic distance) exhibits an exponential decay with rate constant giving the age of admixture (Moorjani et al. 2011; Patterson et al. 2012). This approach to admixture dating is similar in spirit to strategies based on local ancestry, but LD statistics have the advantage of a simple mathematical form that facilitates error analysis. In this article, we comprehensively develop LD-based admixture inference, extending the methodology to several novel applications that constitute a versatile set of tools for investigating admixture. We rst propose a cleaner functional form of the underlying weighted LD statistic and provide a precise mathematical development of its properties. As an immediate result of this theory, we observe that our new weighted LD statistic can be used to infer mixture proportions as well as dates, extending the results of Pickrell et al. (2012). Moreover, such inference can still be performed (albeit with reduced power) when data are available from only the admixed population and one surrogate ancestral population, whereas all previous techniques require at least two such reference populations. As a second application, we present an LD-based three-population test for admixture with sensitivity complementary to the three-population f-statistic test (Reich et al. 2009; Patterson et al. 2012) and characterize the scenarios in which each is advantageous. We further show that phylogenetic relationships between true mixing populations and present-day references can be inferred by comparing weighted LD curves using weights derived from a suite of reference populations. Finally, we describe several improvements to the computation and tting of weighted LD curves: we show how to detect confounding LD from sources other than admixture, improving the robustness of our methods in the presence of such effects, and we present a novel fast Fourier transformbased algorithm for weighted LD computation that reduces typical run times from hours to seconds. We implement all of these advances in a software package, ALDER (Admixture-induced Linkage Disequilibrium for Evolutionary Relationships). We demonstrate the performance of ALDER by using it to test for admixture among all HGDP populations (Li et al. 2008) and compare its results to those of the three-population test, highlighting the sensitivity trade-offs of each approach. We further illustrate our methodology with case studies of Central African Pygmies, Sardinians, and Japanese, revealing new details that add to our understanding of admixture events in the history of each population.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating the timing of multiple admixture events using 3-locus Linkage Disequilibrium

Estimating admixture histories is crucial for understanding the genetic diversity we see in present-day populations. Existing allele frequency or phylogeny-based methods are excellent for inferring the existence of admixture or its proportions, but have less power for estimating admixture times. Recently introduced approaches for estimating these times use spatial information from admixed chrom...

متن کامل

Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium

Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture...

متن کامل

Consistent long-range linkage disequilibrium generated by admixture in a Bantu-Semitic hybrid population.

Both the optimal marker density for genome scans in case-control association studies and the appropriate study design for the testing of candidate genes depend on the genomic pattern of linkage disequilibrium (LD). In this study, we provide the first conclusive demonstration that the diverse demographic histories of human populations have produced dramatic differences in genomewide patterns of ...

متن کامل

Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium.

Gene flow between genetically distinct populations creates linkage disequilibrium (admixture linkage disequilibrium [ALD]) among all loci (linked and unlinked) that have different allele frequencies in the founding populations. We have explored the distribution of ALD by using computer simulation of two extreme models of admixture: the hybrid-isolation (HI) model, in which admixture occurs in a...

متن کامل

Assessing the relative ages of admixture in the bovine hybrid zones of Africa and the Near East using X chromosome haplotype mosaicism.

Historical hybridization events between the two subspecies of cattle, Bos taurus and B. indicus, have occurred in several regions of the world, while other populations have remained nonadmixed. We typed closely linked X chromosome microsatellites in cattle populations with differing histories of admixture from Africa, Europe, the Near East, and India. Haplotype breakdown will occur as admixed p...

متن کامل

Modeling Continuous Admixture Using Admixture-Induced Linkage Disequilibrium

Recent migrations and inter-ethnic mating of long isolated populations have resulted in genetically admixed populations. To understand the complex population admixture process, which is critical to both evolutionary and medical studies, here we used admixture induced linkage disequilibrium (LD) to infer continuous admixture events, which is common for most existing admixed populations. Unlike p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013